Vaccine data, Cases data from the hospital, and the waste water
signal data has been loaded, cleaned, and then merged into one final
dataframe, final_data.
The wastewater signal shows the highest correlation with hospitalization as can be seen in the correlation plot below. To improve the correlation with other vaccination data, log transformation of the data can be tried.
Checking the correlation between lagged values of the waste water signal with current hospital cases. Past 100 lagged values were evaluated. The correlation with the lagged values seem to be higher for the first few lags and then decreases.
## [1] "Correlation for lag of2"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.51
## [1] "Correlation for lag of3"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.5
## [1] "Correlation for lag of4"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.5
## [1] "Correlation for lag of5"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.51
## [1] "Correlation for lag of6"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.5
## [1] "Correlation for lag of7"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.49
## [1] "Correlation for lag of8"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.47
## [1] "Correlation for lag of9"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.46
## [1] "Correlation for lag of10"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.45
## [1] "Correlation for lag of11"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.42
## [1] "Correlation for lag of12"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.41
## [1] "Correlation for lag of13"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.39
## [1] "Correlation for lag of14"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.37
## [1] "Correlation for lag of15"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.36
## [1] "Correlation for lag of16"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.33
## [1] "Correlation for lag of17"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.31
## [1] "Correlation for lag of18"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.29
## [1] "Correlation for lag of19"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.27
## [1] "Correlation for lag of20"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.24
## [1] "Correlation for lag of21"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.23
## [1] "Correlation for lag of22"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.21
## [1] "Correlation for lag of23"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.2
## [1] "Correlation for lag of24"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.17
## [1] "Correlation for lag of25"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.15
## [1] "Correlation for lag of26"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.13
## [1] "Correlation for lag of27"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.12
## [1] "Correlation for lag of28"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.1
## [1] "Correlation for lag of29"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.08
## [1] "Correlation for lag of30"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.07
## [1] "Correlation for lag of31"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.06
## [1] "Correlation for lag of32"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.04
## [1] "Correlation for lag of33"
## N1_N2_avg
## observed_census_ICU_p_acute_care 0.01
## [1] "Correlation for lag of34"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.01
## [1] "Correlation for lag of35"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.04
## [1] "Correlation for lag of36"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.06
## [1] "Correlation for lag of37"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.07
## [1] "Correlation for lag of38"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.07
## [1] "Correlation for lag of39"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.09
## [1] "Correlation for lag of40"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.1
## [1] "Correlation for lag of41"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.11
## [1] "Correlation for lag of42"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.11
## [1] "Correlation for lag of43"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.11
## [1] "Correlation for lag of44"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.11
## [1] "Correlation for lag of45"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.12
## [1] "Correlation for lag of46"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.12
## [1] "Correlation for lag of47"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.12
## [1] "Correlation for lag of48"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.12
## [1] "Correlation for lag of49"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.13
## [1] "Correlation for lag of50"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.13
## [1] "Correlation for lag of51"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of52"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of53"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of54"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of55"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.17
## [1] "Correlation for lag of56"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of57"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of58"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.18
## [1] "Correlation for lag of59"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.18
## [1] "Correlation for lag of60"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of61"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of62"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.17
## [1] "Correlation for lag of63"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of64"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of65"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of66"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of67"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of68"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of69"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of70"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of71"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.14
## [1] "Correlation for lag of72"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of73"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.17
## [1] "Correlation for lag of74"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.18
## [1] "Correlation for lag of75"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.18
## [1] "Correlation for lag of76"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of77"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of78"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of79"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.21
## [1] "Correlation for lag of80"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of81"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of82"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of83"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of84"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of85"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of86"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of87"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of88"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of89"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.21
## [1] "Correlation for lag of90"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.21
## [1] "Correlation for lag of91"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of92"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.2
## [1] "Correlation for lag of93"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of94"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.19
## [1] "Correlation for lag of95"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.18
## [1] "Correlation for lag of96"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.16
## [1] "Correlation for lag of97"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.15
## [1] "Correlation for lag of98"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.13
## [1] "Correlation for lag of99"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.12
## [1] "Correlation for lag of100"
## N1_N2_avg
## observed_census_ICU_p_acute_care -0.11
Correlation between waste water signal and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_5_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_12_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_18_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_30_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_40_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_50_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_60_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_70_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_80_1_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_5_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_12_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_18_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_30_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_40_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_50_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_60_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_70_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
Correlation between Percent_of_Ottawa_residents_80_2_dose and observed_census_ICU_p_acute_care (hospitalizations).
The scatter plots below do not show a perfectly linear relationship between hospitalization and other variables especially with the vaccine data. The relationship looks non-linear and a piecewise linear model might be a better approach. However, to check for the predictive accuracy of the model, we try both Simple Linear Regression model and MARS model below and compare the accuracy.
Performing Simple Linear Regression on the model for predictive analysis. Data is divided into train and test set and the model has been fit on the training data and validated on test data to check performance with metrics. The training set contains 97% of the data.
## [1] "Percent_of_Ottawa_residents_5_1_dose"
## [2] "Percent_of_Ottawa_residents_12_1_dose"
## [3] "Percent_of_Ottawa_residents_18_1_dose"
## [4] "Percent_of_Ottawa_residents_30_1_dose"
## [5] "Percent_of_Ottawa_residents_40_1_dose"
## [6] "Percent_of_Ottawa_residents_50_1_dose"
## [7] "Percent_of_Ottawa_residents_60_1_dose"
## [8] "Percent_of_Ottawa_residents_70_1_dose"
## [9] "Percent_of_Ottawa_residents_80_1_dose"
## [10] "Percent_of_Ottawa_residents_5_2_dose"
## [11] "Percent_of_Ottawa_residents_12_2_dose"
## [12] "Percent_of_Ottawa_residents_18_2_dose"
## [13] "Percent_of_Ottawa_residents_30_2_dose"
## [14] "Percent_of_Ottawa_residents_40_2_dose"
## [15] "Percent_of_Ottawa_residents_50_2_dose"
## [16] "Percent_of_Ottawa_residents_60_2_dose"
## [17] "Percent_of_Ottawa_residents_70_2_dose"
## [18] "Percent_of_Ottawa_residents_80_2_dose"
## [19] "observed_census_ICU_p_acute_care"
## [20] "N1_N2_avg"
Summary of the Linear Regression model with beta coefficients of the variables used for regression analysis to predict hospitalization and their statistical significance is presented below:
##
## Call:
## lm(formula = observed_census_ICU_p_acute_care ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79.695 -5.428 -0.312 5.272 45.365
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.019e+01 2.147e+00 9.402 < 2e-16
## Percent_of_Ottawa_residents_5_1_dose 4.804e-01 6.883e-02 6.980 1.32e-11
## Percent_of_Ottawa_residents_12_1_dose 6.744e-01 7.903e-01 0.853 0.394019
## Percent_of_Ottawa_residents_18_1_dose 1.198e+00 1.651e+00 0.725 0.468696
## Percent_of_Ottawa_residents_30_1_dose -1.234e+00 1.708e+00 -0.722 0.470469
## Percent_of_Ottawa_residents_40_1_dose -7.900e-01 1.026e+00 -0.770 0.441943
## Percent_of_Ottawa_residents_50_1_dose -1.041e+00 7.185e-01 -1.448 0.148364
## Percent_of_Ottawa_residents_60_1_dose 6.934e-01 3.238e-01 2.141 0.032883
## Percent_of_Ottawa_residents_70_1_dose 8.380e-01 1.496e-01 5.602 4.09e-08
## Percent_of_Ottawa_residents_80_1_dose 1.956e-01 1.088e-01 1.797 0.073096
## Percent_of_Ottawa_residents_5_2_dose 3.274e+00 2.334e-01 14.030 < 2e-16
## Percent_of_Ottawa_residents_12_2_dose -4.387e-01 5.395e-01 -0.813 0.416676
## Percent_of_Ottawa_residents_18_2_dose 8.881e-02 1.558e+00 0.057 0.954577
## Percent_of_Ottawa_residents_30_2_dose 3.224e+00 1.989e+00 1.621 0.105931
## Percent_of_Ottawa_residents_40_2_dose -6.732e+00 1.913e+00 -3.520 0.000485
## Percent_of_Ottawa_residents_50_2_dose 7.473e+00 1.958e+00 3.816 0.000158
## Percent_of_Ottawa_residents_60_2_dose -3.610e+00 1.256e+00 -2.875 0.004274
## Percent_of_Ottawa_residents_70_2_dose 7.088e-02 6.899e-01 0.103 0.918222
## Percent_of_Ottawa_residents_80_2_dose -4.442e-01 5.590e-01 -0.795 0.427256
## N1_N2_avg 1.214e+04 3.151e+03 3.853 0.000137
##
## (Intercept) ***
## Percent_of_Ottawa_residents_5_1_dose ***
## Percent_of_Ottawa_residents_12_1_dose
## Percent_of_Ottawa_residents_18_1_dose
## Percent_of_Ottawa_residents_30_1_dose
## Percent_of_Ottawa_residents_40_1_dose
## Percent_of_Ottawa_residents_50_1_dose
## Percent_of_Ottawa_residents_60_1_dose *
## Percent_of_Ottawa_residents_70_1_dose ***
## Percent_of_Ottawa_residents_80_1_dose .
## Percent_of_Ottawa_residents_5_2_dose ***
## Percent_of_Ottawa_residents_12_2_dose
## Percent_of_Ottawa_residents_18_2_dose
## Percent_of_Ottawa_residents_30_2_dose
## Percent_of_Ottawa_residents_40_2_dose ***
## Percent_of_Ottawa_residents_50_2_dose ***
## Percent_of_Ottawa_residents_60_2_dose **
## Percent_of_Ottawa_residents_70_2_dose
## Percent_of_Ottawa_residents_80_2_dose
## N1_N2_avg ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.32 on 379 degrees of freedom
## Multiple R-squared: 0.8708, Adjusted R-squared: 0.8643
## F-statistic: 134.4 on 19 and 379 DF, p-value: < 2.2e-16
Checking residuals in the Simple Linear Regression model and creating a plot of the residuals.
## res
## 329 4.517561
## 313 -3.696496
## 95 43.118046
## 209 -10.710530
## 351 1.249841
## 317 2.157685
Errors in linear regression model are not completely normally distributed and does show a skew towards the right.The qqplot, histogram plot and residual plots reflect this.
Calculating metrics from actual vs predicted on both train and test dataset.
## predicted actual
## 121 93.09910 93
## 193 19.70279 39
## 205 23.92299 21
## 213 23.24561 21
## 219 25.07635 26
## 230 29.62154 28
Root Mean Squared Error for the Simple Linear Regression Model:
## [1] 18.98251
Mape on Test set
## [1] 0.3227292
Root Mean Squared Error for the Simple Linear Regression Model on train set:
## [1] 12.98568
Mape on Train set
## [1] 0.4839652
Standard deviation of the actual data
## [1] 35.92902
Plots comparing actual test data and predicted test data by simple linear regression model
This model uses piece wise linear regression to fit the data. It automatically finds the knots in the data to use for piece wise regression.
The model has been fit on the training data which consist of the 97% of the data.
Automated hyperparameter search is done using grid search algorithm using cross validation with 10 folds. Here data is divided into 10 equal sized folds where the validation is performed on each of those 10 folds using the remaining data for training in each scenario. The combination of those hyperparameters that gives the lowest average error metric on those 10 folds is selected by the cross validation algorithm as the most appropriate hyperparameters.
Performing Cross validation grid search with 4 degrees of interaction
terms corresponding to the degree hyperparameter in the
model and number of knots to include in the final pruned model
corresponding to the nprune hyperparameter in the
model.
####Summary of the Cross validation performed using the MARS model
## Multivariate Adaptive Regression Spline
##
## 399 samples
## 19 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 360, 360, 359, 360, 359, 358, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 5 11.394476 0.8980678 8.840158
## 1 10 5.636099 0.9759868 4.318042
## 1 15 4.874020 0.9823783 3.691912
## 1 20 4.701056 0.9835082 3.580371
## 1 25 4.707047 0.9834735 3.583368
## 2 5 8.971723 0.9373860 6.978411
## 2 10 5.455245 0.9776409 4.202677
## 2 15 4.625841 0.9843087 3.396190
## 2 20 4.653481 0.9842950 3.380894
## 2 25 4.653481 0.9842950 3.380894
## 3 5 9.623049 0.9287439 7.158071
## 3 10 5.637618 0.9759725 4.377765
## 3 15 4.742192 0.9836025 3.524740
## 3 20 4.591559 0.9845973 3.382342
## 3 25 4.591559 0.9845973 3.382342
## 4 5 9.623049 0.9287439 7.158071
## 4 10 5.637618 0.9759725 4.377765
## 4 15 4.742192 0.9836025 3.524740
## 4 20 4.591559 0.9845973 3.382342
## 4 25 4.591559 0.9845973 3.382342
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 20 and degree = 3.
Below the model with the lowest RMSE from cross validation search is displayed.
## degree nprune RMSE Rsquared MAE RMSESD RsquaredSD MAESD
## 1 3 20 4.591559 0.9845973 3.382342 0.9631243 0.006157929 0.478529
The model has selected 8 predictor variables with 2 degree of interaction terms. The final model has 17 terms which includes the intercept. The Rsquare is 98.9%. The RMSE error in the model is 4.28. Therefore, a very high percentage of variance in the response is being explained by the predictor variables in the model.
For the simple linear regression model the Rsquare was 86.0% and only explains 86% of variance in the response. The RMSE was 11.93.
The MARS model definitely outperforms the simple linear regression model with a lower RMSE and higher Rsquare.
## Call: earth(x=data.frame[399,19], y=c(16,42,121,20,...), keepxy=TRUE, degree=3,
## nprune=20)
##
## coefficients
## (Intercept) 94.738719
## h(78-Percent_of_Ottawa_residents_40_1_dose) 1.244096
## h(Percent_of_Ottawa_residents_40_1_dose-78) 1.470696
## h(9-Percent_of_Ottawa_residents_5_2_dose) -11.225877
## h(Percent_of_Ottawa_residents_5_2_dose-9) -3.964935
## h(81-Percent_of_Ottawa_residents_18_2_dose) -0.431949
## h(Percent_of_Ottawa_residents_18_2_dose-81) 22.658149
## h(91-Percent_of_Ottawa_residents_50_2_dose) 0.913697
## h(Percent_of_Ottawa_residents_50_2_dose-91) 21.761615
## h(76-Percent_of_Ottawa_residents_60_1_dose) * h(91-Percent_of_Ottawa_residents_50_2_dose) -0.005725
## h(Percent_of_Ottawa_residents_60_1_dose-76) * h(91-Percent_of_Ottawa_residents_50_2_dose) -0.029524
## h(93-Percent_of_Ottawa_residents_70_1_dose) * h(91-Percent_of_Ottawa_residents_50_2_dose) -0.009494
## h(4-Percent_of_Ottawa_residents_80_1_dose) * h(81-Percent_of_Ottawa_residents_18_2_dose) -0.064483
## h(Percent_of_Ottawa_residents_80_1_dose-4) * h(81-Percent_of_Ottawa_residents_18_2_dose) 0.002027
## h(93-Percent_of_Ottawa_residents_70_1_dose) * h(91-Percent_of_Ottawa_residents_50_2_dose) * h(2-Percent_of_Ottawa_residents_80_2_dose) 0.001192
## h(Percent_of_Ottawa_residents_70_1_dose-93) * h(91-Percent_of_Ottawa_residents_50_2_dose) * h(Percent_of_Ottawa_residents_80_2_dose-92) 1.591627
##
## Selected 16 of 21 terms, and 8 of 19 predictors (nprune=20)
## Termination condition: RSq changed by less than 0.001 at 21 terms
## Importance: Percent_of_Ottawa_residents_5_2_dose, ...
## Number of terms at each degree of interaction: 1 8 5 2
## GCV 19.65996 RSS 6403.543 GRSq 0.9850117 RSq 0.9877031
From the model below it is clear that the best model is one with 3
degrees of interaction and 25 nprune terms.
Using the best model to predict response variable for the train and test set.
Root Mean Squared Error for the MARS Model on Test data:
## [1] 3.036919
Mape on Test set
## [1] 0.1825252
Root Mean Squared Error for the Simple Linear Regression Model on Train set:
## [1] 4.006118
Mape on Train set
## [1] 0.1885106
Plots comparing actual test data and predicted test data by MARS model
The errors in MARS model are much smaller than ones obtained from Simple Linear Regression. There are a few outliers. However, the qqplot shows a pretty much close to normal distribution of errors.
MARS scans each predictor to identify a split that improves predictive accuracy, non-informative features will not be chosen. Furthermore, highly correlated predictors do not impede predictive accuracy as much as they do with OLS models.